Coded Character Sets

A coded character set comprises a mapping from a set of abstract characters (that is, the character repertoire) to a set of integers. The integers in the set are within a range that can be expressed by a bit pattern of a particular size: 7 bits, 8 bits, 16 bits, and so on. Each of the integers in the set is called a code point. The set of integers may be larger than the character repertoire; that is, there may be "unassigned" code points that do not correspond to any character in the repertoire. Examples of coded character sets include

ASCII, a fixed-width 7-bit encoding
ISO 8859-1 (Latin-1), a fixed-width 8-bit encoding
JIS X0208, a Japanese standard whose code points are fixed-width 14-bit values (normally represented as a pair of 7-bit values). Many other standards for East Asian languages follow a similar pattern, using code points represented as two or three 7-bit values. These standards are typically not used directly, but are used in one of the character encoding schemes discussed in Character Encoding Schemes .